Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Fitter: Fix infinite recursion in __getattr__ #1977

Merged
merged 5 commits into from
Feb 23, 2017

Conversation

pavlin-policar
Copy link
Collaborator

Issue

Running fitters on large enough datasets through the test and score widget and setting it to cross validation would result in a stack overflow error.

I am not particularly sure why this happened, my theory is that when test and score receives a large enough dataset, it runs cross validation in parallel, and that the learner attributes were probably accessed statically and failing.

Unfortunately, I have no idea what caused this and how exactly it was called, so I don't really know how to test against this. But since I will most likely change this in the near future, I wouldn't spend too much time on this.

Description of changes

No longer rely on self.kwargs being on the object, but raise attribute error by explicitly getting kwargs with __getattribute__.

Includes
  • Code changes
  • Tests
  • Documentation

@codecov-io
Copy link

codecov-io commented Jan 29, 2017

Codecov Report

Merging #1977 into master will increase coverage by 19.21%.
The diff coverage is 81.81%.

@@             Coverage Diff             @@
##           master    #1977       +/-   ##
===========================================
+ Coverage   70.22%   89.43%   +19.21%     
===========================================
  Files         343       90      -253     
  Lines       54092     9190    -44902     
===========================================
- Hits        37984     8219    -29765     
+ Misses      16108      971    -15137

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ca41f6...49188ee. Read the comment docs.

@kernc
Copy link
Contributor

kernc commented Jan 29, 2017

  1. joblib tries to pickle its arguments.
  2. pickling accesses obj.__getstate__ for "instructions".
  3. __getattr__ handler calls self.get_learner(None) since self.problem_type is not yet set, since we haven't yet fitted anything.
  4. This raises AttributeError in get_learner().
  5. Python thinks obj doesn't have the attribute __getattr__ was called with, so it calls __getattr__ with it.
    ...

I guess the comment in get_learner() is wrong and a TypeError should be raised. With this change, test_evaluation_testing.TestCrossValidation.test_njobs() with learner set to one of the fitters doesn't crash anymore.

@pavlin-policar pavlin-policar force-pushed the fix-fitter-recursion branch 2 times, most recently from 32963d2 to ac6761c Compare January 29, 2017 17:25
@pavlin-policar
Copy link
Collaborator Author

pavlin-policar commented Jan 29, 2017

Wow, thank you very much. I don't think I would've figured that out by myself. That makes the fix much cleaner :)

@pavlin-policar
Copy link
Collaborator Author

So I noticed this was failing in some cases, so I tried to try to make things a bit more straightforward.

Note: This is based on #1968 because I made some changes to the learner widget tests that enable testing fitters, which turned out to be useful here as well.

I've made two rather big changes from last time:

  1. To make Learners simpler in general, we now assume that they will always have a params attribute (this is added to the main learner docstring, but I don't really have a good way to enforce this). The only change this required was on Tree learners, since everything else already had params. It may be nicer to have all the learner params as actual attributes on the learners, which could probably be done easily, but this required less changes. @kernc what do you think about this? The api on learners is still kind of a mess, but this could be an improvement.
  2. I've removed all state from Fitters. Storing the problem type on the fitter was causing all kinds of unpredictable behaviour and was difficult to understand, so I've removed that entirely. @janezd what do you think about this? This should make the fitters simpler.

This does also fix the error that @lanzagar pointed out last week.

Copy link
Contributor

@janezd janezd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused with some changes related to #1986. Could you rebase to master now (#1986 is merged)?

@@ -64,6 +64,8 @@ def __init__(
self.min_samples_split = min_samples_split
self.sufficient_majority = sufficient_majority
self.max_depth = max_depth
self.params = {k: v for k, v in vars().items()
if k not in ('args', 'kwargs')}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is in vars() -- except globals, which we don't like anyway?! I don't like this blind copying.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an attempt to make the Learner API more intuitive. Currently, Orange learners have their settings (or parameters) saved to the object themselves, whereas sklearn learners have a special params field that stores all that data. This field is used heavily in the tests and only rarely in any other code. This makes dealing with the tests a bit confusing. It would probably be better that to switch the field to be protected, and then doing some magic with __getattr__ to make those fields accessible to tests. If this were done, the API to the learners would be a bit more straightforward, but if you think it's unnecessary, I can remove it altogether. This would probably be cleaner than the blind copying that I've done here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand what is the content of vars. It contains binarize, max_depth, min_samples_leaf, min_samples_splet and sufficient_majority; if these need to be copied to params, I'd copy them explicitly. Besides, it probably contains self, and besides that, all global names in the module, which you don't want to copy.

I am probably missing something here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No you're not missing anything at all, I simply tried to emulate what the constructors in the sklearn learners were doing, which use vars(). I'll change it so the params are explicitly copied. I suppose the same issue is then present in all the sklearn learners. I could fix that if you'd like, in a separate PR, this is kind of a mess already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's uglier (although perhaps better).

Leave it like this for now, and perhaps change this in all SKL learners. Perhaps by adding something like this to SKLLearner

from inspect import signature

...

def update_params(self, values):
    param_names = signature(type(self).__init__).parameters
    self.params.update({name: values[name] for name in param_names if name not in {"args", "kwargs"})

which would then be called from __init__ of derived classes with self.update_params(vars()). This would only copy the names that were given as arguments.

I see that there's already some similar magic in our SKL wrappers.

Not in this PR.

@pavlin-policar
Copy link
Collaborator Author

Sorry this has taken me so long to get to, but I'm at it again.

I'm guessing you meant changes related to #1968. The only thing these two have in common is some shared code to make the tests work with fitters.

@janezd
Copy link
Contributor

janezd commented Feb 17, 2017

I will try to merge all your PRs today, so they can be a part of the next release, which is due soon.

The first three commits of this PR are already in the master. So if you just rebase to master, they will disappear here and make it easier to review the code.

This occurred when a large dataset was used in the test and score widget
using cross validation. Changing the exception from AttributeError to
TypeError fixed this problem.
@pavlin-policar
Copy link
Collaborator Author

Yeah, I'm sorry, I had forgotten to rebase, it's done now.

It is necessary to specify which problem type any param on the learner
is valid for, and I had forgotten to do this for the SVM learner. This
fixes that.
@ajdapretnar
Copy link
Contributor

This fixes the issue I had when training a classification tree on employee attrition data.

@janezd janezd merged commit 9b3faf4 into biolab:master Feb 23, 2017
@pavlin-policar pavlin-policar deleted the fix-fitter-recursion branch February 23, 2017 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants